Probability is a “mathy” word that is also used quite often in the regular world, too. And the regular world definition of probability matches up quite well with the “mathy” definition.
Without getting too technical, probability is the chance that something can occur. For instance, there is a 50% probability that a coin will land “heads” on a fair coin toss. Or, using the technical jargon:
\[P\left(heads \mid fair\ coin \right) = 50 \% = 0.5\]
OK, a bit of an explanation of the jargon above:
I did not need to explain above that the probability of getting a heads on a fair coin was 0.5 (or 50%). You just “knew” that.
Similarly, it “just makes sense” that the probability of getting a 2 on a fair 6-sided die roll is \(\frac{1}{6}\).
(Note: I will stop using the word “fair” when I specify the object every time. It is important that the die roll be fair, but hereafter it will be implied / assumed.)
But how does it “just make sense” that the chance to get a 2 on a 6-sided die is \(\frac{1}{6}\)? As we’ve seen in logic and set theory, some ideas that start out easy can quickly get complicated. So, let’s talk about some of the assumptions that make the die roll and the coin toss “obvious”.
Specifically, here are a few attributes that \(P\), the probability of an event occurring, has:
In the final bullet above, it said that we could simply add probabilities that were mutually exclusive. OK, now let’s deal with that statement.
We went through the example:
The probability of a 1 on a die roll is \(\frac{1}{6}\), and the probability of a 2 on a die roll is \(\frac{1}{6}\), so the probability of a 1 or 2 on a die roll is simply \(\frac{1}{6}+\frac{1}{6} = \frac{1}{3}\)
OK that makes total sense, so where’s the problem?
Let’s think about the following scenario:
The probability of a 2 on a die roll is \(\frac{1}{6}\), and the probability of an even number on a die roll is \(\frac{1}{2}\) (\(\frac{1}{6} + \frac{1}{6} + \frac{1}{6}\)), so the probability of a 2 or an even number on a die roll is \(\frac{1}{6}+\frac{1}{2} = \frac{2}{3}\), right?
Well, I’m leading you on. The scenario above is wrong.
Let’s go through it. What does it mean to say the following?
the probability of an even number
This means the probability of either a 2, 4, or 6 occurring. Ahhh… of course, I’ve counted the occurrence of a 2 twice… once when I said the “probability of getting a 2”“, and once when I said the”probability of getting an even number“!
OK, so in this case I get it, but how would I handle this, in a generic situation? How do I handle it in a situation where maybe the probabilities are being counted twice (or more)?
We don’t actually add the probabilities, we perform a union of the probabilities. So, above, instead of saying
I should have said
How do we figure out the union? We have to specify what the outcomes actually are, then take the union of the outcomes, which we already know to be the combination of the outcomes, removing any duplicates, from our set theory work.
What is the probability to get either a 2 on a die roll or an even number on a die roll?
How can I get a 2 on a die roll? Well, I have to roll a 2. Or, the set of total solutions to get a 2 on a die roll is simply \(\left\{2\right\}\).
How can I get an even number on a die roll? I can roll either a 2, 4, or 6. So the set of total solutions to get an even number on a die roll is \(\left\{2, 4, 6\right\}\)
The union of \(\left\{2\right\}\) and \(\left\{2, 4, 6\right\}\) is only \(\left\{2, 4, 6\right\}\) (since a set contains only unique numbers / possibilities).
Using a Venn Diagram, we see
we can see that there is only 1 way to roll a 2, and 2 ways to roll an even number that is not a 2, namely, to roll a 4 or 6. So if we count up the union, it’s \(2 + 1 = 3\).
And, of course, the total set of possible results on a 6-sided die is \(\left\{1, 2, 3, 4, 5, 6\right\}\)
So,
\[P\left(2\ or\ even\ number \mid 6-sided\ die \right) = \] \[\frac{\#\ of\ outcomes\ in \left\{2, 4, 6\right\}}{\#\ of\ outcomes\ in \left\{1, 2, 3, 4, 5, 6\right\}} = \frac{3}{6} = 0.5\]
Remember in set theory that the union can be thought of as the “addition” of two sets, minus the intersection. Let’s set up a different situation, to yield a more interesting Venn Diagram.
\[P\left(\left(< 3\right)\ or\ \left(odd\ number\right)\mid 6-sided\ die \right)\]
(Remember, that reads “probability of getting less than a 3 or getting an odd number, given a six-sided die”)
So, looking at the following Venn Diagram of the number of outcomes
we clearly see there are 4 total outcomes (2 outcomes only found in {1, 3, or 5}, 1 outcome found only in {1 or 2}, and 1 outcome found in both… “1”). We just add up the number of outcomes, to correctly reach 4 (\(2 + 1 + 1\)).
But to the original point about “adding” probabilities, if we “add” that there are 3 outcomes in {1, 3, 5} and 2 outcomes in {1, 2}, then we can “subtract” the intersecting outcome of {1}. So it’s \(3 + 2 - 1\).
Or, using our more formal set theory symbols:
\[P\left(A\ or\ B\right) = P\left(A\right) + P\left(B\right) - P\left(A\cap B\right)\]
which we know from set theory is mathematically identical to:
\[P\left(A\ or\ B\right) = P\left(A\cup B\right)\]
Finally, if \(P\left(A\cap B\right)\) is \(0\), then we have:
\[P\left(A\ or\ B\right) = P\left(A\right) + P\left(B\right)\]
which is what was said early on. So, when we say mutually exclusive, we mean that \(P\left(A\cap B\right) = 0\)
OK, when we were “adding” probabilities - or, more accurately, when we were creating the union of probabilities - we were looking at one event and the different outcomes for that one event.
What about when we’re looking at multiple events occurring? For instance:
What is the likelihood of rolling 2 sixes in a row?
You probably already know the answer, it’s \(\frac{1}{36}\).
You probably know how to get that answer: simply multiply the chance for the first event occurring, \(\frac{1}{6}\), by the chance for the second event occurring, also \(\frac{1}{6}\).
Yeah, but let’s delve into why things work that way!
Earlier, when discussing the mathematically rigorous way to define a coin toss, we wrote:
\[P\left(heads \mid fair\ coin \right) = 50 \% = 0.5\]
Remember that whole “given that” phrase? In this case “given that it was a fair coin”, written \(\left( \mid fair\ coin \right)\). But we never really explained why mentioning this was important. Well, let’s delve into it a bit more now.
We will use the same given that language in order to rigorously solve our \(\frac{1}{36}\) answer above.
What’s another way that we can write the following?
What is the likelihood of rolling 2 sixes in a row?
Well, I could also say:
What is the likelihood of rolling a six given that I just rolled a six?
Using our rigorous symbol methods:
\[P\left(rolling\ six \mid rolled\ six \right) = \frac{1}{6}\]
Wait, so the probability of rolling a 6 “given that” I’ve already rolled a 6 is still \(\frac{1}{6}\)?? Yep! What are we really saying? We’re simply saying that the the chance of rolling a 6 is \(\frac{1}{6}\), it doesn’t matter whether we just rolled a 6 right before it or not. We’re simply rolling one die, so the chance of rolling a 6 on that one die is \(\frac{1}{6}\), no matter what happened beforehand.
But, that’s the same as the probability of rolling a 6 without any of this “given that” verbiage! Again, yes, that’s correct. So, what’s the point of using “given that”?
Saying that one more time:
I know the probability of rolling a six, it’s just \(\frac{1}{6}\). So, I can say the probability of rolling the 2nd six (assuming the first 6 was already rolled) is simply \(\frac{1}{6}\). (Given that I have already rolled a six beforehand). I can also say the probability of rolling the 1st six is \(\frac{1}{6}\), of course.
So I know how to get each of the probabilities now. I know how to get the “first” six and the “second” six. I just need to know how to combine them. Aha!
\[P\left(rolling\ 2\ sixes \right) = P\left(rolling\ six \right) \times P\left(rolling\ six \mid rolled\ six \right)\]
\[P\left(rolling\ 2\ sixes \right) = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36}\]
Let’s break this down again and make sure we understand what happened:
OK, so I still don’t understand the “given that” phrase we’ve been using! It didn’t make any difference in the example above. (In other words, why couldn’t I just write \(P\left(rolling\ six \right)\) since it’s the same thing as \(P\left(rolling\ six \mid rolled\ six \right)\))
What’s the difference? They are both \(\frac{1}{6}\), right? Yes, in this case, but not always.
For all the cases we’ve talked about above, we were looking at the probabilities of independent events occurring.
That is, if I rolled a 6 the 1st time, the likelihood that I could roll a 6 the 2nd time is still the same as if I had rolled anything else the 1st time.
Huh?
When I roll two 2 sixes, I first roll a six, then I roll another six. Once the 1st 6 has been rolled, it’s just as likely for the 2nd 6 to be rolled: the 1st one already happened. It’s in the past. We can forget about it.
But won’t that always be true?? No, not for dependent events.
We’ll talk about dependent events more later, but here’s an example.
Instead of rolling dice, let’s look at playing cards. (It’s fun to stick with games & gambling!) In case you don’t know, here is some information on a deck of cards (sans jokers):
Now let’s calculate this one:
What is the likelihood of being dealt two cards, and getting two queens?
Well, there are 4 queens, and there are 52 cards. So…
\[P\left(Getting\ Queen \right) = \frac{4}{52} = \frac{1}{13}\]
and for the moment, let’s imagine it’s like the dice, and the chance is still the same:
\[P\left(Getting\ Queen \mid Got\ Queen \right) = \frac{4}{52} = \frac{1}{13}\]
so:
\[P\left(Getting\ 2\ Queens \right) = P\left(Getting\ Queen \right) \times P\left(Getting\ Queen \mid Got\ Queen \right)\]
\[P\left(Getting\ 2\ Queens \right) = \frac{1}{13} \times \frac{1}{13} = \frac{1}{169} = 0.592 \%\]
Wait, but is it? Clearly not, otherwise, I wouldn’t be asking. So, what’s wrong?
Let’s break down each step, and be super rigorous. OK, first:
\[P\left(Getting\ Queen \right)\]
is really
\[P\left(Getting\ Queen \mid Fresh\ Deck \right)\]
(Note, you don’t always have to specify a “given that”, if it is or seems obvious enough. I am doing it here in order to be completely clear.)
and in that case:
\[P\left(Getting\ Queen \mid Fresh\ Deck \right) = \frac{4}{52} = \frac{1}{13}\]
But next, here is the trickier one:
\[P\left(Getting\ Queen \mid Got\ Queen \right)\]
Well, let’s think about it. There are only 4 queens (and 52 cards) in the deck. If we’ve already pulled a queen, then we now have only 3 queens left, and only 51 cards left. Ahhh… there is our CONDITIONAL probability in action. So now let’s correct things
\[P\left(Getting\ Queen \mid Got\ Queen \right) = \frac{3}{51} = \frac{1}{17}\]
OK, that’s a pretty major change, from \(\frac{1}{13}\) down to \(\frac{1}{17}\). Let’s see how it impacts our final answer:
\[P\left(Getting\ 2\ Queens \right) = P\left(Getting\ Queen \mid Fresh\ Deck \right) \times P\left(Getting\ Queen \mid Got\ Queen \right)\]
\[P\left(Getting\ 2\ Queens \right) = \frac{1}{13} \times \frac{1}{17} = \frac{1}{221} = 0.452 \%\]
The last topic to cover before we can attack the cool Birthday Problem below is to think of how to frame a problem.
Above we looked at the chance of rolling 2 sixes in a row as \(\frac{1}{36}\) (\(\frac{1}{6} \times \frac{1}{6}\)). But, how about this problem:
What’s the chance of rolling at least 1 six out of 2 rolls (on a six-sided die)?
Wow! How to attack that? It seems quite challenging. There are some conditionals we haven’t discussed. It seems like we need to new knowledge.
We don’t! We just need to rephrase things. First, let’s try this:
What’s the chance of not rolling a six (on a six-sided die)?
Well, we could roll a 1, 2, 3, 4, or 5. So, the chance is simply \(\frac{5}{6}\). But there’s actually another way to phrase it. We could also say it’s the opposite of the chance to roll a 6.
Saying “opposite” in this case is a bit sloppy. More mathematically accurately, we would say it is the *complement of the chance of rolling a 6.
So, if the chance to roll a 6 is \(\frac{1}{6}\), then the chance to not roll a six is \(1 - \frac{1}{6} = \frac{5}{6}\). In other words:
Complementary Probabilities:
\[P\left(not\ A\right) = 1 - P\left(A\right)\]
Alright! Armed with this knowledge, we have everything we need to tackle the birthday problem.
Now onto a fun puzzle where we can use all of what we’ve learned here.
What’s the chance that 2 students in a class of 25 have the same birthday?
Let’s not throw in all the fancy math we have learned yet, and just “spitball it”. Well, gosh, there are 365 days in a year (usually), and only 25 students. It just seems really unlikely that any students would share a birthday, right?
Who knows what the right answer is, but my gut tells me maybe 10% chance? Maybe 20% at best??
Let’s first start to break the problem down using our probability verbiage, so that we can then use our newfound probability knowledge.
Let’s make the problem a bit smaller, so that we can think about it easier. If we were to say “What’s the chance that 2 students in a class of 1 have the same birthday?” that doesn’t really make sense; there is only 1 person in the class! So, the simplest we can make it is:
What’s the chance that 2 students in a class of 2 have the same birthday?
In this case, “all” of the students would need to have the same birthday. But in this case, notice that it doesn’t matter which day is their birthday, just that they share the same day. In other words, the day of the first birthday chosen doesn’t really matter.
We’ll come back to that idea of how to handle which day is their shared birthday in a moment. First, let’s jump to the quickest solution to this simple problem:
What we really want is to find the chance that the 2 students share their birthday. As mentioned above, all that matters is whether the second student has the same birthday as the first.
\[P\left(Birthday\ On\ "Occupied\ Day" \mid 1\ Previous\ Student\right) = \frac{1}{365}\]
In other words: the chance that 2 students share the same birthday in a class of 2 is just the chance that the 2nd student’s birthday is on the same day as the 1st student’s birthday. The first student’s birthday will only take up 1 day (out of 365), so the chance is \(\frac{1}{365}\) that the second student’s birthday is on the same day as the second students.
Before we leave this simple reformulation, let’s first re-phrase the problem using this complement problem restructuring that we just saw in the last section.
Instead of saying “what’s the chance the 2 student’s share the same birthday?”, let’s say:
What is the chance that 2 students out of a class of 2 do not share the same birthday? (Then take the complement of this answer.)
Why do we want to do that? Because it will make it easier to frame the problem when we add more students.
If we frame the complement problem, we have two probabilities that we need to solve and multiply:
Remember that we said the day of the first birthday chosen doesn’t really matter. In probability jargon, the probability that the first birthday occurs on a day didn’t already have a birthday on it is 100% (remember, it’s the first one we’re looking at!):
\[P\left(First\ Birthday\ On\ "Unoccupied\ Day" \right) = 1\]
The second birthday still has a very high chance of falling on an unoccupied day, but not 100 %. It has a \(\frac{364}{365}\) chance of falling on an unoccupied day (since that last day is, of course, the birthday of the first person).
And to clean things up, let’s frame this probability using our “given that” formulations:
\[P\left(Birthday\ On\ "Unoccupied"\ Day \mid 1\ Previous\ Student\right) = \frac{364}{365}\]
So, the full probability is:
\[P\left(All\ Birthdays\ On\ "Unoccupied"\ Days \right) =\] \[P\left(Birthday\ On\ "Unoccupied"\ Days \mid 0\ Previous\ Students\right) \times\] \[P\left(Birthday\ On\ "Unoccupied"\ Days \mid 1\ Previous\ Student\right)\]
That’s awful wordy, so let’s use some abbreviations and say:
\[P\left(None\ Shared \right) = P\left(Bday \mid 0 \right) \times P\left(Bday \mid 1 \right)\]
And remember that:
\[P\left(None\ Shared \right) = 1- P\left(All\ Shared \right)\]
Finally, putting in some numbers:
\[P\left(All\ Shared \right) = 1 - \left(1 \times \frac{364}{365} \right) = \frac{1}{365}\]
Now that we’ve finally rigorously (and usefully) structured the simpler problem, it’s relatively easy to structure the full problem. (This is often the case in probability… it’s all about understanding how to frame the problem; how to translate the ideas into rigorous math.)
Above, by the time we’d finished, we’d found a way to be able to handle multiple events occurring: the chance of the first and second students having “unoccupied” birthdays (even though the first student’s chance was 1).
If we can find the chance that all 25 students would have unoccupied birthdays, then we can take the complement, and we have the chance that at least 1 student shares a birthday with at least 1 other. (Notice that we won’t find how many students share birthdays, whether 3 students share the same birthday, or anything else. We are merely finding that at least 1 shares with at least 1 other.)
Here we go:
\[P\left(None\ Shared \right) = P\left(Bday \mid 0 \right) \times P\left(Bday \mid 1 \right) \times \dotsm \times P\left(Bday \mid 24 \right)\]
We have 25 students, so we will need 25 probabilities, and each student’s probability is calculated given that one less student’s birthday is found. Hence, the “givens” range from 0 to 24.
Also, since our claim is that each day is unoccupied, they are all mutually exclusive, so that the range of available days keeps shrinking by a day. In other words, the formula above becomes:
\[P\left(None\ Shared \right) = 1 \times \frac{364}{365} \times \dotsm \times \frac{341}{365}\]
You can go ahead and calculate that, but unless I made a typo, you’ll come to:
\[P\left(None\ Shared \right) = 0.4313\]
And finally, switching back to the answer we really want (the chances that at least 2 students share a birthday):
\[P\left(All\ Shared \right) = 1- P\left(None\ Shared \right) = 0.5687\]